Model Selection

INT4 quantization

# INT4 quantization

Gemma 3 27b It Quantized.w4a16

This is a quantized version of google/gemma-3-27b-it, supporting visual-text input and text output. Optimized through weight quantization and activation quantization, it enables efficient inference with vLLM.

Qwen3 30B A3B Quantized.w4a16

INT4 quantized version of Qwen3-30B-A3B, reducing disk and GPU memory requirements by 75% while maintaining high performance.

Large Language Model

Qwen3 32B Quantized.w4a16

INT4 quantized version of Qwen3-32B, reducing disk and GPU memory requirements by 75% through weight quantization while maintaining high performance

Large Language Model

Deepseek R1 Quantized.w4a16

INT4 weight-quantized version of DeepSeek-R1, reducing GPU memory and disk space requirements by approximately 50% while maintaining original model performance.

Large Language Model

Gemma 3 12b It GPTQ 4b 128g

This model is an INT4 quantized version of google/gemma-3-12b-it, using the GPTQ algorithm to reduce parameters from 16-bit to 4-bit, significantly decreasing disk space and GPU memory requirements.

Whisper Large V3 Turbo Quantized.w4a16

An INT4 weight quantization version based on openai/whisper-large-v3-turbo, supporting efficient audio-to-text tasks

Speech Recognition

Transformers English

Mistral Small 3.1 24B Instruct 2503 GPTQ 4b 128g

This model is an INT4 quantized version of Mistral-Small-3.1-24B-Instruct-2503, using the GPTQ algorithm to reduce weights from 16-bit to 4-bit, significantly decreasing disk size and GPU memory requirements.

Large Language Model

Gemma 3 27b It GPTQ 4b 128g

This model is an INT4 quantized version of gemma-3-27b-it, reducing disk and GPU memory requirements by decreasing the number of bits per parameter.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase